Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#313 enabled fix differing versions of dependency scikit learn #316

Merged

Conversation

ckunki
Copy link
Contributor

@ckunki ckunki commented Aug 14, 2024

Closes #313

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@ckunki
Copy link
Contributor Author

ckunki commented Aug 16, 2024

During nbtests in CI we observed the error message

E    ValueError: The feature names should match those that were passed during fit.
E    Feature names unseen at fit time:
E    - 10
E    - 11
E    - 2
E    - 3
E    - 4
E    - ...
E    Feature names seen at fit time, yet now missing:
E    - FALPHA
E    - FASYM
E    - FCONC
E    - FCONC1
E    - FDIST
E    - ...

[CodeBuild]
@ckunki ckunki had a problem deploying to approve-test-execution August 16, 2024 11:41 — with GitHub Actions Failure
[CodeBuild]
@@ -0,0 +1,108 @@
{
Copy link
Collaborator

@ahsimb ahsimb Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specific version of the library used by the default SLC depends ...

"AI-Lab adapts its own version of the library if required." I don't think it's a good explanation of what is happening here. AI-Lab itself is not doing anything. The user should run the script like the one in this notebook to make sure the versions are aligned.


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the paragraph in the next push.
Please review again.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahsimb thx, that was also what I meant needs to be formulated differently and @ckunki for changing

@@ -0,0 +1,108 @@
{
Copy link
Collaborator

@ahsimb ahsimb Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you split this code cell in two parts and add markdown cells for each part explaining the code?

In the first part we create a UDF that reads the version of the scikit_learn library. The UDF runs inside the language container, therefore the library version detected by the UDF is the version installed in this container.

In the second part we compare the version returned by the UDF with the version in the AI-Lab environment. If they differ we install the UDF's version in the AI-Lab environment.

BTW, do we need to run pip with the --upgrade option?


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, see next push.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, do we need to run pip with the --upgrade option?

That I cannot answer. My tests have been successful without this option.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@@ -91,7 +91,7 @@
"\n",
Copy link
Collaborator

@ahsimb ahsimb Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #12.    model.fit(X_train.values, y_train)

Why? The model is happy to take a DataFrame as an input.


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We observed an error message as specified in comment-2293085752.
The notebook sklearn_train_abalone.ipynb did use .values already.

Copy link
Collaborator

@ahsimb ahsimb Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe earlier versions of scikit-learn didn't care about feature names. It seems that when we read input in udf like ctx.get_dataframe(num_rows=1000, start_col=1) we do not get the column names in the pandas DataFrame. That's strange.

@ckunki ckunki merged commit fd290cd into main Aug 20, 2024
10 checks passed
@ckunki ckunki deleted the bug/#313-Enabled_fix_differing_versions_of_dependency_scikit-learn branch August 20, 2024 08:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix differing versions of dependency scikit-learn
3 participants